Day20-Python 操作 docx 文件

第 11 屆 iThome 鐵人賽

DAY 20

自我挑戰組

原來電腦可以這樣用!? 果蠅也懂的程式語言教學系列第 20 篇

11th鐵人賽

oxygenTW

團隊喵喵喵

2019-10-06 21:54:00

21184 瀏覽

分享至

今天介紹一個第三方函式，python-docx，這是一個可以利用 Python 建立 Word 格式文件的工具，這可以利用在大量批次生成文件，或是 Web 服務生成使用者資料表等等工作上，今天主要著重在"建立文件"的介紹上。

準備虛擬環境與安裝套件

pipenv shell
pipenv install python-docx

建立 docx 文件

依照 python-docx 給的範例，我們可以做出這樣的一個 Word 文件

完整程式碼如下：

from docx import Document
from docx.shared import Inches

document = Document()

document.add_heading('Document Title', 0)

p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')

document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
    'first item in ordered list', style='List Number'
)

#document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)

table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

document.save('demo.docx')

程式碼說明：

一開始我們先引入 python-docx，from docx import Document，接著透過 Document() 建構函式宣告一個 Document 物件。

add_heading() 是建立標題的方法

document.add_paragraph()則是建立段落，paragraph 就是段落的意思

add_run() 可以設定粗體或斜體等等特殊格式，他是屬於 paragraph 之下的方法，所以必須搭配 paragraph 物件使用。

document.add_picture() 用來插入圖片

document.add_table(rows=?, cols=?) 用來建立表格，表格傳入參數為行數與列數，並且透過 tuple 指定資料。

document.add_page_break() 插入換頁符號

document.save('demo.docx') 儲存 docx 檔案到 demo.docx

讀取 docx 文件

import docx
Doc = docx.Document(r"D:\oxygen\Desktop\鐵人挑戰-程式教學\tmp\test.docx")

print("檔案內含段落數：",len(Doc.paragraphs),"\n")

testList = []
for text in Doc.paragraphs:
    testList.append(text)

for pg in testList:
    print(pg.text)

我的測試用檔案：

import docx
Doc = docx.Document(r"PATH to test.docx")

print("檔案內含段落數：",len(Doc.paragraphs),"\n")

testList = []
for text in Doc.paragraphs:
    testList.append(text)

for pg in testList:
    print(pg.text)

首先一樣先引入與建立 Document 物件，建立參數為檔案路徑

Doc.paragraphs 會回傳讀取到的段落，以 list 回傳，所以我們先用 len(Doc.paragraphs) 來取得總段落數。

接著用 for 迴圈尋訪所有 list 中的段落，並存在我們建立的 list 中，接著再將他們全部 print 出來。

Note:
我在讀取檔案的時候一直遇到 docx.opc.exceptions.PackageNotFoundError: Package not found at PATH 的問題，後來才發現是我建立的 word 文件還沒有存檔，所以檔案內容為空，只要檔案內有東西就會解決了...